Text Encoding Format
A text encoding format specifies a way of formatting or algorithmically transforming a particular base encoding. For example, the UTF-7 format is the Unicode standard formatted for transmission through channels that can handle only 7-bit values. Other text encoding formats for Unicode include UTF-8 and 16-bit or 32-bit formats. These transformations are not viewed as different base encodings. Rather, they are different formats for representing the same base encoding.Similar to text encoding variant values, text encoding format values are specific to a particular text encoding base value or to a small set of text encoding base values. A text encoding format is defined by the
TextEncodingFormat
data type.
typedef UInt32 TextEncodingFormat;The functionGetTextEncodingFormat
(page 52) returns the text encoding format of a text encoding specification.The following enumeration defines constants for specifying text encoding formats:
enum { /* Default TextEncodingFormat for any TextEncodingBase */ kTextEncodingDefaultFormat = 0, /* Formats for Unicode encodings */ kUnicode16BitFormat = 0, kUnicodeUTF7Format = 1, kUnicodeUTF8Format = 2, kUnicode32BitFormat = 3 };constant descriptions
For Unicode and ISO10646
kTextEncodingDefaultFormat
- The standard default format for any base encoding.
kUnicode16BitFormat
- The 16-bit character encoding format specified by the Unicode standard, equivalent to the UCS-2 format for ISO 10646. This includes support for the UTF-16 method of including non-BMP characters in a stream of 16-bit values.
kUnicodeUTF7Format
- The Unicode transformation format in which characters encodings are represented by a sequence of 7-bit values. This format cannot be handled by the Unicode Converter, only by the Text Encoding Converter.
kUnicodeUTF8Format
- The Unicode transformation format in which characters are represented by a sequence of 8-bit values.
kUnicode32BitFormat
- The UCS-4 32-bit format defined for ISO 10646. This format is not currently supported.